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ABSTRACT 


The United States Army Recruiting Command requires tools to quantify the 
impact of factors in the recruiting environment, to identify differences in the recruiting 
processes across its five regional subordinate units, and to measure the effectiveness of its 
policies and resource expenditures. This thesis examines recruiting data for the “high- 
quality” male demographic from July 1992 to September 1997. It uses multivariate time 
series analysis to predict the number of enlistment contracts signed in a month as a 
function of fifteen exogenous and endogenous factors plus monthly indicators. A 
stepwise recursion using bootstrap simulation is developed to identifying significant 
factors in the multivariate time series. The significant factors in the reduced models are 
compared to those contained in models developed in previous studies. The models are 
also used to create nine-month projections of recruiting production, which are compared 
to known production figures from test set data to determine forecast accuracy. The results 
of this research support the intuition that the influential factors differ by region. The 
stepwise model reduction recursion using bootstrap simulation offers potential for further 
refinement and application. 
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EXECUTIVE SUMMARY 


The United States Army is experiencing the greatest recruiting shortages since the 
inception of the All-Volunteer Force in 1973. The service faces unprecedented 
competition for young people as unemployment is at its lowest level in thirty years and 
college attendance rates are the highest in American history. The U.S. Army Recruiting 
Command (USAREC) is the organization charged with recruiting civilians for service in 
the Army. USAREC requires tools to quantify the impact of factors in the recruiting 
environment, identify differences in the recruiting processes across its five regional 
subordinate units, and measure the effectiveness of its policies and resource expenditures. 
This thesis examines recruiting data for from July 1992 to September 1997, which was a 
very dynamic period for the Army and Recruiting Command. The scope is limited to the 
high-quality male demographic. The Army defines a high-quality recruit as one who 
scored above the 50th percentile on the Armed Forces Qualification Test and who is a 
high school graduate or general equivalency diploma holder. 

A considerable amount of research has been dedicated to the topic of Army 
recruiting. One of the goals of this thesis is to validate factors from previous models on 
more current data. Many observers have proposed new or changing influences on the 
recruiting environment. A further objective of this thesis is to explore these suppositions 
quantitatively by combining new factors with ones previously shown to be significant. 
Enumeration of the differences in the recruiting environment throughout the country is 
another objective. Finally, this thesis aims to develop an accurate tool for predicting 
recruiting production that can be used by Army leaders. 

Multivariate time series analysis is used to predict the number of enlistment 
contracts signed in a month as a function of exogenous and endogenous factors plus 
monthly indicators. Fifteen factors are initially included for examination in this study as 
predictive variables. They are selected based on their appearance in previous models or in 
recent research. Autoregressive moving average (ARMA) models are developed to 
produce residuals with a suitable structure for bootstrapping. The bootstrap is used to 
overcome the difficulties in determining significant factors presented by the short 
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duration of the recruiting data time series. This technique allows resampling from within 
the existing data to provide robustness in the factor determination process. A stepwise 
recursion is developed to eliminate factors from the time series models that are not 
statistically significant. The factors remaining in the reduced models are compared to 
those found to be significant in past research. The developed models are also used to 
create nine-month projections of recruiting production. The results are then compared to 
known production figures from test set data to determine forecast accuracy levels. 

The final models indicate that unemployment figures and high school graduate 
wage levels are significant factors for predicting recruiting production. These results are 
consistent with findings from previous studies. However, the impact of these two factors 
is not clearly interpretable across the five recruiting brigades. No consistent factors for 
measuring the competition between the Army and post-secondary schooling emerge from 
the model development process. The final models do successfully capture the seasonal 
nature of recruiting. There are considerable differences in the final model for each 
brigade, indicating that influential predictors of recruiting production differ regionally. 
The forecasts produced using the final models capture the general behavior of the 
recruiting production series in the test period. The stepwise recursion using bootstrap 
simulation for identifying significant factors in multivariate time series analysis proved to 
be a useful tool and offers potential for further refinement and application. 
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I. INTRODUCTION 


A. BACKGROUND 

The United States Army is currently experiencing the greatest recruiting shortages 
since the inception of the All-Volunteer Force. The service faces many challenges in the 
recruiting realm. Competition for young people is unprecedented as unemployment has 
recently reached 30-year lows. The nation’s youth have demonstrated a decreasing 
propensity to enlist as measured by the annual Youth Attitude Tracking Survey (YATS) 
and have enjoyed the highest college attendance rates in U.S. history (Parlier, 1999). As 
a result of these and other factors, the Army has failed to meet its recruiting requirements 
every year since 1997. 

The current recruiting conditions are in stark contrast to those of the early 1990’s, 
a period that represented unequalled success in terms of the quantity and quality of 
soldiers recruited by the Army. This achievement coincided with a decreased recruiting 
demand as the active Army force was pared down from its cold war level of close to 
750,000 to its current strength of approximately 480,000. As the force was reduced, 
recruiting requirements decreased 33% (Asch, 1999). Towards the conclusion of the 
drawdown, the Army entered a “steady state,” meaning that every soldier who left the 
service had to be replaced by a new recruit. Hence, accession requirements have actually 
increased slightly since 1995. 

In response to recent shortcomings in meeting recruiting objectives. The Army 
Chief of Staff, General Shinseki, declared recruiting “the number one mission on his 
essential task list” (Dickey, 1999). The organization charged with the mission of 
recruiting civilians for service in the Army is the U.S. Army Recruiting Command 
(USAREC). To improve performance, USAREC has increased recruiter strength 15% 
since the beginning of 1997. It is also offering costly new enlistment incentives. 

USAREC is dedicated to matching people to Army personnel requirements. Like 
many high-tech organizations, the Army seeks to fill its ranks with “high-quality 
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recruits.” The Army defines a high-quality recruit as one who scored above the 50th 
percentile on the Armed Forces Qualification Test (AFQT) and who is a high school 
graduate or general equivalency diploma (GED) holder. Army policies over the past 
decade have required that 90 to 95% of all accessions have a high school diploma or 
GED. Since there are such demanding policy requirements for high quality recruits, this 
demographic category receives a majority of focus and recruiting effort. 

B. STATEMENT OF PROBLEM 

The troubled status of recruiting has gained public attention. The challenges in 
this arena are well documented and USAREC is applying more resources to achieve its 
objectives. Simple allocation of greater funds to USAREC alone is not the answer to the 
service’s manpower shortcomings. Precise application of these monies is critical. As 
pointed out by RAND researcher Bruce Orvis, “decisions about increases [must be] 
preceded by identification of specific shortages that need to be remedied” (Orivs, 1996). 
In a period of increasing competition for eligible recruits, the Army’s challenges will not 
recede in the foreseeable future. Therefore, USAREC must operate with the greatest 
possible efficiency in its application of limited resources. 

Under these conditions USAREC requires tools to measure the effectiveness of its 
policies and resource expenditures and to apply an appropriate balance of effort across its 
five major subordinate units, which represent geographical regions of the United States. 
This thesis uses multivariate time series analysis to predict recruiting production (the 
number of enlistment contracts signed in a given period) as a function of exogenous and 
endogenous factors. 

1. Research Questions 

Models that address macro-level policies, recruiter distribution, and allocation of 
resources have utility for USAREC. The following research questions motivated the 
development of the time series models in this thesis: 




a. What are the most significant economic, demographic, and policy predictors of 
recruiting success? 

b. What are the differences between the five regional recruiting brigades regarding these 
various factors? 

c. How effectively can recruiting production be predicted using a multivariate time series 
model? 

2. Scope and Assumptions 

The models developed in this thesis predict production at the regional level. The 
scope of this study is limited to the high-quality male demographic, which single largest 
category of recruits accessed. Though USAREC does recruit from U.S. territories and 
protectorates, as well as from within American military communities based overseas, this 
study addresses only recruiting efforts and production in the fifty states plus the District 
of Colombia. 

The data used for this thesis was compiled by the Defense Manpower Data Center 
(DMDC) for the Navy College Fund Evaluation Study. The period examined is from July 
1992 to September 1997. The advantage of analyzing this period was that it is a time of 
great change, which offers the potential to provide greater contrast in certain indicators 
that can be exploited. The disadvantage is that it does not address current resources levels 
and economic conditions. 

During the period analyzed by this study, USAREC underwent two major 
organizational changes with respect to its subordinate units, which are called brigades. 
Initially, it maintained a five-brigade structure. During 1992, it changed to a four-brigade 
structure, and then returned to five brigades in 1995. This study assumes that the structure 
of the brigades was constant, using the current five-brigade organization and its 
associated geographical boundaries. 
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c. 


RESEARCH OBJECTIVES 


A considerable amount of past research has been dedicated to the topic of Army 
recruiting. One of the goals of this thesis is to validate factors from previous models on 
more current data. Many observers have proposed new or changing influences on the 
recruiting environment. A further objective of this thesis is to explore these suppositions 
quantitatively by combining new factors with ones previously shown to be significant. 
Enumeration of the differences in the recruiting environment throughout the country is 
another objective. Finally, this thesis aims to develop an accurate tool for predicting 
recruiting production that can be used by Army leaders. 

D. ORGANIZATION 

This introduction provides the objectives and organization of this thesis. A 
detailed overview of Army recruiting and a review of previous research on this subject is 
contained in Chapter II. Chapter III describes the factors in the time series models and the 
motivation for their inclusion. The modeling methodology is developed in Chapter IV. 
Chapter V contains the results and a discussion of their implications. Finally, conclusions 
and recommendations are provided in Chapter VI. 
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II. ARMY RECRUITING 


A. OVERVIEW 

1. Mission and Structure 

The Army currently requires approximately 75,000 new soldiers a year. The 
Office of the Deputy Chief of Staff for Personnel (ODCSPER) determines this 
requirement and passes it to the U.S Army Recruiting Command, the organization 
responsible for recruiting civilians for service in the Army. Recruiting Command is 
organized into five subordinate brigades, which have general regional responsibilities as 
follows: northeast, southeast, north central, south central, and west. In many cases, states 
are divided between different regions. The current organization’s geographic boundaries 
are reflected in figure 2.1. 



Figure 2.1 U.S. Army Recruiting Command Structure 


USAREC has approximately 1,600 recruiting stations where the business of 
making contacts and signing contracts actually takes place. In addition to its 6,000 
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recruiters, USAREC currently employs an additional 4,250 uniformed personnel and 
1,100 civilians in research and support activities. Beyond people, one of its major 
resources is advertising. In 1997, USAREC spent approximately $87.9 million in national 
television, radio, and print advertising campaigns. This figure does not include additional 
funding that is provided to subordinate commanders for local advertising efforts. 

2. High-quality Recruit Definition 

The Army seeks to fill its ranks with “high-quality recruits.” Such people are 
trainable on technically oriented jobs. They also have higher contract completion rates, 
and greater retention for additional contracts. The Army defines a high-quality recruit as 
one who is in Test Score Category (TSC) I-IIIA, meaning he or she scored above the 50th 
percentile on the Armed Forces Qualification Test. Additionally, a high-quality recruit is 
in Educational Credential Tier 1, which means that he or she is a high school or GED 
diploma holder. The Department of Defense (DOD) and the Army state policy objectives 
for the number of TSC I-IIIA individuals recruited. U.S. law, DOD, and the Army have 
increasing minimum requirements for high school degree holders respectively. The Army 
policy dictating Tier 1 accessions has varied between 90 and 95% of all accessions over 
the past decade. 

3. Recruiting in the 1990s 

The 1990s represented a period of great change for the United States Army. The 
decade began with the Cold War victory and was followed in 1991 with the victory in the 
Gulf War. Since then, the Army has experienced increasing operational tempo with 
numerous peacekeeping and humanitarian deployments to Somalia, Haiti, Bosnia, and 
Kosovo, among others. Correspondingly, the 1990s represented a turbulent period for the 
Recruiting Command. 

The post-Cold War drawdown reduced the Army’s strength approximately 35%. 
From 1989 to 1998 accession requirements decreased by about the same degree (Asch, 
1999). The combination of a reduced demand for new soldiers, the military’s increased 
public popularity following the triumph in the Persian Gulf, and the 1992 economic 
recession, allowed the Army to recruit the best educated force in its history. In 1991, 98% 




of new soldiers were high school graduates (Eitelberg, 1994). Under these conditions, 
Recruiting Command was able to cut recruiter strength 25% and reduce advertising 
budgets over 50% between 1989 and 1994 (Orvis, 1996). USAREC also reduced 
overhead in the organization of its subordinate commands. In 1992, it consolidated its 
subordinate units from a five-brigade to a four-brigade structure. 

Towards the conclusion of the drawdown, the Army entered a “steady state,” 
meaning that every soldier who left the service had to be replaced by a new recruit. As a 
result, accession requirements began to increase slightly in 1995. In response, USAREC 
returned to a five-brigade structure, and has increased recruiter strength 15% since 1997. 
It is now initiating a Corporal Recruiter Program to employ younger soldiers to better 
relate to its target audience (Dickey, 1999). USAREC is also offering shorter enlistment 
terms, and in 1999 began, for the first time, to combine enlistment bonuses with the 
Army College Fund. Despite these efforts, the Army has failed to meet its recruiting 
requirements every year since 1997. 

Several factors in the recruiting environment have contributed to the Army’s 
recent shortfalls. The lack of a military threat to the nation decreases the perceived need 
to serve. By 1999, the military-to-civilian pay gap had grown to 13.5%, its widest level 
since 1979 (Parlier, 1999). The country has experienced the lowest sustained 
unemployment rate in thirty years (Bureau of Labor Statistics, 1999). Finally, the 
increasing number of youth attending post-secondary education and the increasing 
financial return on a college degree are two related trends that are thought to have had a 
significant impact on the market for high-quality youth. In just a four-year period starting 
in 1990, the number of 18-19 year-old youths attending post-secondary education 
increased 5% to 60.2%, and the number of 20-21 year-olds increased 13% to 44.9% 
(Asch, 1999). Despite an increase in supply of college graduates, the wages they earn 
have continued to increase relative to those of high school graduates. This indicates that 
the demand for the skills that these graduates bring to the workplace continues to be 
greater than the supply. 
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B. REVIEW OF PREVIOUS RECRUITING RESEARCH 


A significant number of studies focus on predicting recruiting production for the 
All-Volunteer Force. This research provides insight into influential factors and the 
methods used to identify them. The following two studies do not include results from 
multivariate regression, but do provide useful background information on important 
variables and trends in Army recruiting. 

1. General Background Studies 

Orvis, Sastry, and McDonald’s 1996 Military Recruiting Outlook breaks the 
recruiting process into 2 major factors: “supply of potential enlistees” and “conversion of 
potential supply” (Orvis, 1996). The researchers employ single-variable regression of 
specific indicators to identify trends in propensity and in conversion of potential supply. 
The authors determine that the predicted supply for FY 94 and 95, as measured by 
propensity of high-quality recruits to enlist from the YATS results, was actually greater 
than pre-drawdown levels of supply. This suggests that any shortfalls in recruiting for 
these two years resulted from the Army’s inability to convert supply to enlistments. The 
study reveals that the trend in propensity to enlist was decreasing, especially for 
minorities. The authors predict (accurately in retrospect) that by FY 97 this trend, 
combined with the increasing post-drawdown accession requirements, would result in the 
service facing a supply shortage in addition to its conversion difficulties. The study 
recommends further research to identify causal factors for the conversion shortcomings. 

Asch, Kilbum, and Klerman’s 1999 RAND study. Attracting College-Bound 
Youth into the Military , suggests that recent recruiting shortcomings are a result of 
permanent changes in the civilian labor market. Specifically, they state that the increase 
in the college premium, which is the difference between the average real wage of a 
college degree holder and that of a high school diploma holder, is driving more high- 
quality youth to seek post-secondary education. Hence, their research indicates that all 
services are increasingly competing against higher education and not the immediate labor 
market for TSC I-IIIA recruits. The researchers use existing economic models of 
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recruiting supply and conduct statistical analysis of various factors to arrive at their 
policy recommendations. 

2. Multivariate Time Series Studies 

The following five studies use multivariate regression analysis and/or time series 
regression analysis to examine factors effecting recruiting production. All five focus on 
the high-quality enlistee category. 

Robert Cotterman wrote Forecasting Enlistment Supply: A Time Series of Cross 
Sections Model for a 1986 RAND study. The author develops a model that predicts 
monthly enlistment rates for each service in each state based on three empirical factors 
and 68 indicator variables. Cotterman uses monthly state-level data for each service over 
a 78-month period starting in 1974. One of the model’s distinguishing features is that the 
covariance structure allowed correlation in disturbances across periods, across services, 
and across- and within- state components. By using a time series of cross-sections the 
author avoids collinearity problems associated with using purely time-series data 
(Cotterman, 1986). The first factor in the model represents the position in the business 
cycle by a measure of a state unemployment rate’s deviation from its trend. The second 
factor is a ratio of military compensation to manufacturing wages. The last empirical 
factor is a ratio of recruiting force strength to the target male population size. Indicator 
variables include month, state, and GI Bill availability. The model’s forecasts for FY 81 
differ from the actual results by 2% to 13%. It is most accurate for the Air Force. All 
predictors demonstrate expected behavior and unemployment is the most significant 
factor. The author concludes that the covariance structure developed in this model 
reduces the standard error of the estimates from those in earlier models. 

Lewis (1987) constructs a time-series of cross sections regression model of 30 
environmental factors on Army recruiting production for TSCI-IIIA males. Lewis groups 
the factors into five major categories: economic, socio-demographic, recruiting resources, 
enlistment policies, and enlistment competition. The data used covers the period from 
FY80 to FY84 and is geographically based on 55 of the existing 56 recruiting battalions. 
The research concludes that the four most positive environmental factors are relative 
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military pay, unemployment, recruiter strength, and advertising. The most negative 
factors are minority representation in the population, college degree density, and the 
introduction of a less robust college fund program. 

Dertouzos and Polich’s 1989 RAND study. Recruiting Effects of Army 
Advertising, is one of the first research projects to differentiate between various 
advertising media. Their study uses monthly data for a three-year period from 1981 to 
1984 for 66 geographical areas defined by the boundaries of the Military Entrance 
Processing Stations (MEPS). The model controls for economic and demographic 
conditions and intensity of recruiter effort. The dependent variable is high-quality 
enlistments predicted by the number of low-quality recruits, local supply factors 
(unemployment rate, manufacturing wages, recruiter strength, bonus programs), 
advertising intensity by medium, and recruiter activity. The most significant supply 
factors are recruiter strength and unemployment rate. The researchers compare the 
marginal return on advertising, recruiter staffing, and cash bonuses and conclude that 
advertising is the most cost-effective of these three resources. The study reveals that 
national magazines and local newspapers are the most effective media followed by 
national radio and network television. Dertouzos and Polich determine that the most cost- 
effective media are national magazine and newspaper advertisements. 

John Warner and Beth Asch summarize the results of a number of empirical 
models of enlistment supply in their 1995 paper, The Economics of Military Manpower. 
They state that there have been two generations of models since the beginning of the All- 
Volunteer Force. The general form of the first models is In H = (3 In X, in which H 
represents the number of high-quality enlistees and X represents a vector of supply 
variables. The advantage of the logarithmic form is that the variable coefficients could be 
easily interpreted as “supply elasticities” (Warner, 1995). The authors declare that the 
second generation of models first appeared in 1986 and began to account for the behavior 
of recruiters. The form of these modes is In H = A. In L + 3 In X + In E, where H and X 
represent the same elements as the earlier models. L represented the number of low- 
quality recruits and E represented a measure of recruiter effort based on quotas. The 
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results of the second-generation models consistently indicate that unemployment rates 
and relative civilian-to-military pay ratios are significant factors in the recruiting process. 
The authors conclude that the number of recruiters is the most significant recruiting 
resource factor. 

Dan Goldhaber published a critical review of the Navy’s Enlisted Goaling Model 
in a 1999 report for The Center For Naval Analyses. The Navy Recruiting Command 
uses this model to predict high-quality recruit contracts on a quarterly basis. The Navy 
and Army definition of high-quality enlistees is the same. The model’s independent 
variables include recruiter strength, seasonally adjusted unemployment rates, a military- 
to-civilian pay ratio, YATS propensity figures, combined Army and Navy advertising 
expenditures, veteran population figures, and additional indicator variables for 
demographics, seasonality, and policy measures. In the model, all non-binary variables 
are in logarithmic form. The model uses an autoregressive form to account for correlation 
between recruiting production in successive quarters. The model’s predictions from 1994 
to 1999 are within 10 percent of actual production results. Goldhaber uses data from 1992 
to 1998 to analyze the structure and components of the model. He concludes that 
collinearity of the predictive variables did not cause bias in the predictions. He finds that 
the existing first-order autoregressive form of the model is appropriate. Finally, 
Goldhaber suggests that feedback from recruiting success influences advertising 
budgeting. Hence, he recommends removing advertising as a predictive variable to 
prevent potential biases in the coefficient estimates and the model predictions. 

3. Summary 

These studies provide insight into what factors have been influential in predicting 
recruiting production in the past. Over the period encompassed by these works, the 
mission and composition of the Army has changed dramatically. The quality of the force 
as measured by the number of high school graduates enlisting has drastically improved, 
increasing from 16% in 1979 to over 90% throughout the 1990s. Despite these major 
changes, in all but one multivariate regression analysis, unemployment is the most 
influential predictor of recruiting production for high quality enlistees. In that one study, 
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unemployment ranks second to recruiter strength as the most significant indicator. More 
recent works suggest that competition with post-secondary education and not with the 
unskilled labor market is a factor with growing importance in predicting recruiting for 
high-quality youth. 

C. EXISTING USAREC PRODUCTION MODELS 

One of the forecasting tools USAREC uses is the Command Level Mission Model 
(CLEMM). It is a model that predicts production at the Battalion level as a function of 
major demographic indicators and recruiter intensity. In this model, recruiter intensity is 
measured by recruiter strength and operational policies (e.g. the number of recruiting 
workdays in a month, which can be controlled by varying the number of mandated 
working Saturdays). Historically, the model has had an accuracy rate within 5% for the 
TSC I-IIIA category, but is extremely labor-intensive to support. Though the model is 
still maintained by USAREC and used by the Enlisted Accessions Branch of the 
ODCSPER, USAREC has abandoned CLEMM in favor of a predictive model based on 
recent production performance (Pettit, 1999). Beyond CLEMM there are no large-scale 
models currently in use by USAREC that predict production by incorporating policy, 
resource, demographic, and economic predictors (Kaylor, 1999). 
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III. DATA 


A. FACTOR SELECTION AND DESCRIPTION 

Sixteen factors are initially included for examination in this study as predictive 
variables. They are selected based on their appearance in previous models or in recent 
research. The intent of selecting this large number of factors is to determine if factors 
included in older models are still significant indicators for predicting recruiting 
production and to determine if new factors postulated to be influential are in fact 
significant. Unless otherwise specified, all data was provided by the Defense Manpower 
Data Center (DMDC). This data was originally compiled for a study of the Navy College 
Fund being conducted by Dr. John Warner of Clemson University. The factors are 
described below. 

The first factor is mission. It reflects the numerical goal for male high school 
graduates and high school senior in AFQT categories I-IIIA (high-quality) contracts set 
by Recruiting Command Headquarters for its subordinate Recruiting Brigades to meet 
each month. It is selected to account for the effort that the production recruiters and their 
commanders expend in order to meet their assigned recruiting mission. It is also selected 
to implicitly account for incentives, awards, and bonuses offered to the recruiters on the 
assumption that the magnitude of rewards are adjusted to correspond with the demands of 
the mission. 

The second factor selected is recruiter strength, which reflects the number of 
production recruiters assigned to each brigade. Production recruiters are the 
noncommissioned officers whose job is to make contacts with potential recruits and write 
enlistment contracts. Production recruiters represent a critical subgroup of the personnel 
assigned to Recruiting Command and in this context are distinct from commanders, 
staffs, and government civilians. In previous studies, recruiter strength has been identified 
as one of the most cost-effective factors effecting recruiting production. 
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The third, fourth, fifth, and sixth factors represent the impressions made by Army 
advertising in television, radio, magazines, and newspapers respectively. These factors 
measure the total audience exposures for each national advertising campaign in a month. 
They do not reflect the impact of local area marketing programs. The television 
impressions are measured for advertising run on both network and cable programming. 
The newspaper impressions combine papers with national distribution along with college 
campus newspapers. The Army’s contracted advertising agencies provide impression 
figures for various demographic groups. For this study, the audience addressed is 15-24 
year-old males. 

The percent of eligible recruits receiving the Army College Fund (ACF) when 
enlisting is the seventh factor. The ACF provides an incentive for youth to join the Army 
with the promise of dedicated money for post-secondary education upon completion of 
service. The ACF is only available to potential recruits in TSC I-1IIA. It offers funding in 
addition to the Montgomery G.I. Bill, which is offered to all Tier 1 enlistees. Research 
conducted by Beth Asch in her 1999 RAND study suggests that as a greater proportion of 
young people attend college, this program may be increasingly effective in attracting 
college-bound youth to enlist. 

The eighth factor selected is the target population size. This figure represents the 
total males in AFQT categories I-IIIA in each recruiting region. This factor is included 
because Eitelberg and Mehay (1994) predict that a decreasing youth population will 
compound recruiting challenges. 

The unemployment rate is the ninth factor selected. This figure is the ratio of 
unemployed to the civilian labor force expressed as a percentage. The data is directly 
extracted from the Bureau of Labor Statistics’ Local Area Unemployment Statistics 
(BLS, Selective Data Access). The unemployment rate chosen is not seasonally adjusted, 
because the models developed in this thesis include indicator variables for month. 
Unemployment appears as a significant factor in all models reviewed. It represents the 
competition for youth that the service faces from the civilian labor market. 
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The next factor is high school graduate wages, a measure of the average weekly 
wage earned by male high school graduates in each state. This data is extracted from the 
Monthly Current Population Survey, which is a joint project of the Bureau of Labor 
Statistics and the Census Bureau. Individuals surveyed are males age 18-35. To be 
included in the survey, an individual must have normal weekly hours of at least 30 
greater hours. Like unemployment, youth wages are included in all of the previous 
models examined. 

The overall 17-21 year-old college attendance rate represents the eleventh factor. 
This rate is determined by dividing the college population for each state by the total 
youth population. These figures are extracted from data compiled by Woods and Poole 
Economics, Inc., an independent firm that produces county-level economic and 
demographic projections. DMDC provided this data for use in this thesis. The attendance 
rate in this model is not specific to gender or to the Army’s “high-quality” criterion. This 
factor is another measure of the competition for bright young people between the military 
and post-secondary education. 

The college premium represents the final factor addressing the Army’s 
competition with colleges for qualified personnel. In this thesis, the figure represents the 
difference in weekly wages between male high school graduates and college graduates. 
Like the tenth factor, it is derived from the Monthly Current Population Survey. 

The thirteenth, fourteenth, and fifteenth factors represent the monthly recruiting 
success, measured in signed contracts, of the Air Force, Marine Corps, and Navy in the 
high-quality male demographic. This category is included to determine if the relationship 
between the recruiting efforts of other services is competitive or complementary. 

The final eleven factors are binary indicator variables for each of the months from 
February through December. January represents the baseline month and is not 
specifically represented by an indicator variable. 
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B. DATA AGGREGATION 


The original data for the response variable and the fifteen non-month predictor 
variables was in various forms regarding their geographic and time divisions. The 
following table reflects the original form of each variable: 


Variable 

Index 

Geographic 

Division 

Time 

Division 

Army high-quality male contracts 

NA 

County 

Monthly 

Recruiting mission 

1 

County 

Monthly 

Recruiter strength 

2 

County 

Monthly 

TV advertising impressions 

3 

State 

Monthly 

Radio advertising impressions 

4 

State 

Monthly 

Magazine advertising impressions 

5 

State 

Monthly 

Newspaper advertising impressions 

6 

State 

Monthly 

Percent of eligible recruits receiving the 

college option 

7 

State 

Monthly 

Target population size 

8 

County 

Yearly 

Unemployment 

9 

State 

Monthly 

High school graduate wages 

10 

State 

Yearly 

College attendance rate 

11 

State 

Yearly 

College wage premium 

12 

State 

Yearly 

Air Force high-quality male contracts 

13 

County 

Monthly 

Marine Corps high-quality male contracts 

14 

County 

Monthly 

Navy high-quality male contracts 

15 

County 

Monthly 


Table 3.1 The Original Form of the Data for the Selected Variables. 


In order to develop a separate model for each recruiting brigade, the original data 
is aggregated geographically to reflect USAREC’s current regional boundaries. The 
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response variable and six of the independent variables are enumerated at the county level 
in the data provided by DMDC. A brigade-to-FIPS county code file provided by 
USAREC is used to index the data for each of the 3,116 individual counties to its 
appropriate recruiting brigade. The data is then summed for each brigade for each month. 
For each of these seven variables there are 215,004 records (3,116 counties multiplied by 
the number of months in the series). Due to the large size of these files, the aggregation 
procedure is executed using SAS programs developed by Dennis Mar of the Naval 
Postgraduate School Systems Management Department. For each variable, the number of 
cases in which records have a FBPS identifier that does not exist in the indexing code is 
less than 0.10%. 

The data for the variables originally organized at the state level is converted to the 
appropriate regional structure using weights derived from the target population. The 
weights are calculated using 

. , targetPopulation,, 

weight sr =.... . : , 

targetPopulation s 

where the subscript sr indicates the state, s, in region r. For states that are divided among 
regions, the numerator for the weight calculation is the portion of the state’s target 
population in each region. Once the weights are calculated, they are multiplied by the 
state figure for an independent variable. These values are then summed over states to 
determine a figure for the brigade, as shown: 

indepVariable r = ^ (weighty ■ indepVariable s ). 

The four independent variables originally represented in annual time divisions are 
converted to monthly figures by developing a linear relationship between the data points. 
Monthly figures are determined by 

f indepVariable , - indepVariable ) 

indepVariable ym = indepVariable + ---- • monthNumber 

l 12 ; 

where y represents year and m month. Figures for target population size and college 
attendance rates are calculated assuming that the original data points are for the month of 
January. The figures for high school graduate wages and college premium are calculated 
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assuming that the original data points are for the month of June. The monthNumber factor 
reflects the months, ordered 1-12, between the original annual data points. This linear 
transformation of annual observations has the potential to produce additional noise in the 
process. Original data with monthly observations is preferable, but is unavailable. 
However, target population figures and college attendance rates are not expected to 
change significantly month-to-month. The greatest potential for error induction is for the 
college premium and especially the high school graduate wage level, both of which could 
potentially exhibit seasonal behavior. 

C. EXPERIMENTAL DESIGN AND DATA SEGREGATION 

The original data represents a 63-month period from July 1992 through September 1997. 
The first 54 months are selected as an analysis data set for training the models. For all 
model development, T, the number of observations a series, is initially equal to 54. The 
remaining nine months are reserved as the validation data set to test the accuracy of the 
models’ forecasts. The series extrema and averages for the full, test and training data sets 
are listed in Appendix A. 
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rv. METHODOLOGY 


A time series is a sequence of observations made over time. The guiding principle 
behind time series analysis and forecasting is that the future can be predicted based on 
determination of patterns in past data (Bowerman, 1979). In multivariate time series 
analysis the analysis task is expanded to include the determination of the interrelationship 
between multiple series (Chatfield, 1996). Based on the relatively small size of the 
recruiting data set and the varying scales of the regressor values, determining which 
factors were significant is problematic. The means employed to overcome this difficulty 
is the bootstrap (Efron, 1998). It consists of resampling from within the existing data to 
provide robustness in the factor determination process. Use of the bootstrap technique 
requires that the residuals of a hypothesized model be independent. This necessitates the 
development of an autoregressive moving average (ARMA) model for each brigade. 

These requirements lead to the following methodology. First, select an 
appropriate time series model to produce residuals with structure suitable for 
bootstrapping. Second, develop the bootstrap recursion. Third, develop a recursion to 
conduct stepwise reduction of the model to identify significant factors. Fourth, develop 
and perform diagnostics on a final reduced model. Finally, use this model to predict 
future recruiting production and determine the accuracy of the predictions. The process 
described in this chapter addresses the steps to develop one model and one nine-month 
forecast. It is applied five separate times to develop a model and a forecast for each 
USAREC brigade. 


A. FITTING MUTIVARIATE ARMA MODELS 

For most measurements taken at fixed intervals over time, there is an underlying 
structure to the data. That is, there exists an association between one observation and its 
“neighbors.” One of the primary tasks of the analysis is to determine the strength of that 
relationship. A multivariate time series also accounts for the influence of regressor series 
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on the response variable. Models of the following form capture these relationships for the 
recruiting data: 

y t -H y = A( Mission , - ju x ) +... + p 21 December + - ju y ) +... 

(3.1) 

+ 0p(y,-p -My)+ £ t+ Q\ £ t _i + - + G q £ t-q » 

where y, is the response variable, high quality male contracts, at time t. The (3, specify the 
relationship of the value of the regressors to the response variable. For both the response 
and the non-binary predictive variables the data must be centered on p, the mean value of 
the respective series. The autoregressive parameters, <p p , capture the strength of the 

relationship between the value of the response variable at time t and observations in 
previous periods. Uncertainty in the time series process is captured by e t , a “purely 
random disturbance term with a mean of zero and a variance of o (Harvey, 1994). The 
moving average parameters, 0 9 , represent the relationship between the response and these 
disturbances in previous periods. The number of autoregressive parameters, p, and the 
number of moving average parameters, q, define the order of an ARMA (p , q) model. In 
essence, the multivariate ARMA model for each brigade represents the deviation of the 
response variable at time t from the mean of its series by the deviation of the non-binary 
predictive variables from their respective series means, the magnitude of the monthly 
effect, plus the autoregressive and moving average effects. 

Equation 3.1 models the complete series of recruiting data. The values of the 
model parameters are estimated through analysis from a sample of this theoretically 
infinite series. A number of notational changes are made to distinguish these estimations 
from the parameters of the complete series. The observed values of the response variable 
and the predictive variables, ju y and //,, are represented by y t and x t , where the index 

i = 1,2,...,15 runs over the indices listed in Table 3.1. The actual regression coefficients, 
actual autoregressive and moving average parameters, and disturbances are estimated by 

0 ., A” A > an d £ , respectively. To simplify the representation of the terms in equation 
(3.1) and to address the distinctions of sample data, the following notation is introduced. 
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The deviation of the response variable, high-quality male recruiting contracts, from the 
mean of its series is represented by 

z, = y<- y . 

while the value of the response variable estimated by a model is represented by z ,. 

The centered regressors are represented by 

s it = (x it -x,). 


where the index i = 1,2,...,15 on each respective x it corresponds to the indices listed in 
Table 3.1. The models developed based on the sample data therefore have the form 


z,=0 ,S|, + - + At December + $ ( z,_, +... + f p z,. p +£,+ 6,£ t -\ + - + B q S,_ q 


(3.2) 


Models of the form reflected in equation (3.2) are created using the Gaussian 
maximum likelihood estimation method in the S-Plus statistical software package 
(Mathsoft Inc., 1999). Two criteria are used for selecting the appropriate ARMA model 
for each brigade. First, the model’s residuals must not display any significant 
autocorrelation or partial autocorrelation. Second, the model has to be of the lowest 
possible order while fitting the data well. 

The autocorrelation, p*, represents the strength of the relationship between any 
two observations in a time series separated by a lag of k time periods. The autocorrelation 
is a dimensionless measure with values between 1 and —1. When p k is large in magnitude, 
observations k time units apart tend to move together in a linear fashion, and hence are 
not independent. The sign of p* indicates the direction of this movement. The partial 
autocorrelation, p**, represents the autocorrelation between any two observations 
separated by a lag of k ignoring the effects of the intervening observations. The 
autocorrelation function (ACF) and partial autocorrelation function are lists of p* and p** 
at lags k= 1,2,3... (Bowerman, 1979). The graphs of these functions are called 
correlograms, which are the actual diagnostic tools used to assess the first criterion. An 
example correlogram with lags up to k= 24 is displayed in Figure 3.1. 
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H-Q Male Recruiting Production 



Figure 3.1 Example Correlogram Produced Using S-Plus 

The standard for determining significant ACF and partial ACF involves Bartlett’s 
approximation of the standard error of autocorrelation estimates, var[r*] (see Box and 
Jenkins, 1976, pp. 34-35). Values of the ACF at any lag k > 0 which exceed 

+ /- 2^fvar[r k ] are considered significant. This range is automatically calculated by S- 

Plus and indicated by the dotted horizontal lines in the generated correlograms. 

Box and Jenkins define the notion of developing parsimonious ARMA ip, q ) 
models, which means choosing the smallest values of p and q that adequately capture the 
nature of the time series. The Akaike information criterion (AIC) captures the essence of 
this concept. It provides a tool for comparing models, by indicating a model’s goodness 
of fit while penalizing complexity. Models are selected by minimizing the AIC, which is 
defined by 

AIC = -2 * lo gL(y/) + 2 n. 

In this equation, L(y/) is the maximized value of the likelihood and n is the sum of the 
ARMA model orders (p + q). Complex models have higher values of L(y), hence a large 
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negative first term, but are penalized for having a large value of n. Simple models have 
smaller values of L(\j/) creating a smaller negative first term, but small values of n. The 
AIC is a means of balancing these competing characteristics. (Harvey, 1994) 

B. BOOTSTRAP 

All of the ARMA models developed have random disturbance terms 
(£j,f 2 ,...,£ r ). In models that include an AR term, observations from the first p time 
periods are used to initiate the autoregressive process and no estimates from these periods 
are derived. Hence, the first disturbance term from an AR model is i p+l . Since the e t 

represent random disturbance terms, by definition they are assumed to be independent. 
The ARMA model residuals are defined by 

e t =z < -z, 

in which e t is the deviation of the selected model’s response variable value from the 
original observation at time t. Like the random disturbance terms, the residuals are 
assumed to be independent based on the first modeling criterion. Because of their similar 
properties, the residuals are used to approximate the random disturbance terms. This 
concept is the basis for the development of the bootstrap method in this application. 

The bootstrap recursion first samples with replacement from the set of residuals 
( e p+ i, e p+2 ,..., ej) to develop a new set of residuals, e t *. Next, a new series of the response 
variable, z*, is simulated using the new set of residuals to represent the random 
variations, so that z*= z, + e t *. The final step is to refit an ARMA model of the same 
order as the original, using the original regressor's paired with the simulated z* series. 
The simulated response variable values force slight changes in the estimates of the 

A A A 

/?,, (p p , and 6 . At the conclusion of the recursion, the regression coefficients are saved 

to a vector. The applicable autoregressive and/or moving average parameters are saved to 
their own vectors. 
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Concatenation of the output vectors for multiple repetitions of the bootstrap 
recursion creates a matrix of regression coefficients. Each column of this matrix 
represents the developed regression coefficients for one factor. Analysis of the central 
tendency and variance of the figures in each column provides robustness in determining 

the value of the and hence, the influence of each factor on the response variable. 

The method of generating the approximations to the random disturbance terms 
described above represents a non-parametric approach. The original distribution of the 
residuals used to represent the random disturbance terms is preserved in this method. An 
alternate approach is also developed in which the random variations are generated by 
sampling from a normal distribution with a mean of zero and a standard deviation equal 
to the standard deviation of the set of residuals, (e p+ i, e p+ 2, ..., ei). The S-Plus code for 
executing this bootstrap recursion allows specification of whether to use the parametric or 
non-parametric approach for sampling from the residuals. The default method is non- 
parametric. 

The underlying theory for the development of this recursion is due to Efron and 
Tibshirani (1998, chapter 8). See Appendix B for the bootstrap recursion S-Plus code. 

C. STEPWISE REDUCTION 

The bootstrap recursion provides a tool for overcoming some drawbacks of 
having a limited number of time series observations from which to determine the 
relationship between the predictive and the response variables. The next stage of model 
development addresses the research objective of identifying the significant factors in 
predicting recruiting production. 

The underlying premise for the elimination of factors is as follows. If the mean 
value of a regression coefficient, f } t , calculated from multiple iterations of the bootstrap 

is within a defined interval around zero, it can be interpreted as not significantly different 
from zero (Efron, 1998). If this is the case, the associated factor is not considered 
influential in predicting the behavior of the dependent variable. Such a factor can be 
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eliminated from the model, which promotes model simplicity and improves model 
accuracy by removing noise associated with the discarded factor. 

Experimentation with the application of this idea for identifying non-significant 
factors provides valuable insight. Initially, factors that had means in a defined range 
around zero were removed all at once. The reduced ARMA models produced by this 
approach gave unpredictable results. Specifically, the regression coefficients for the 
remaining factors demonstrated sign changes from the original to the reduced model. 
This observation led to the development of a stepwise recursion to promote stability as 
the model is reduced. 

The intent of the recursion is to eliminate factors one-by-one until only significant 
factors remain in the model. The recursion first executes the bootstrap to develop a 

matrix of /?,. Factors are identified as candidates for elimination if the mean of a column 
of regression coefficients lie within the range of 0 ± a -(standard deviation of the column 
of regression coefficients). The a term is an input parameter that controls the width of 
the interval. For each candidate factor, the proportion of the estimated >9, in the range 

0 ±a -(standard deviation of the column of regression coefficients) is calculated. The 
candidate factor that has the highest proportion is eliminated and a new ARMA model of 
the original order is refit. This recursion is repeated until no factors are eliminated. The 
final significant factors are then displayed. 

The S-Plus code for executing the stepwise regression allows control of the 
number of iterations of the bootstrap recursion, the method of residual sampling 
(parametric or non-parametric), and a, the tolerance defining the size of the interval 
around zero. The accepted standard for repetition of the bootstrap is 1,000 iterations 
(Efron, 1998). Non-parametric residual sampling is employed and a value of 1 is used for 
a. The code for the stepwise reduction recursion is in Appendix B. 
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D. FINAL MODEL DEVELOPMENT AND DIAGNOSTICS 


After the significant factors for a brigade are determined, the final reduced 
ARMA model is produced. It is of the same order as the original full model. 
Correlograms are plotted to ensure there is no significant ACF or partial ACF of the 
residuals. The AIC of the reduced model is calculated to ensure it is less than the AIC of 
the original full model. The regression coefficients of the factors remaining in the final 
model are then examined to ensure that they have the same sign as they did in the initial 
full model. The presence of a sign change is not necessarily an indication that the model 
is invalid, but it prompts scrutiny of the data and the factor reduction process. 

E. FORECASTING 

Once the final model for each region is determined, it is used to forecast the 
number of high-quality male contracts in the test period. Like the training data, the test 
data is first centered on the mean of respective variable from the training series. 

A simulation is used to develop a predicted time series of the response variable. 
The length of the predicted time series is nine months, corresponding to the length of the 
test data period. The regressor values are from the test data. The simulation uses a 
parametric approach to the generation of the random errors. The mean of the random 
errors is zero and the standard deviation is equal to the standard deviation of the residuals 
from the final model. The simulation is repeated 1,000 times to develop multiple 
predictions for the nine-period series. The mean and standard deviation of the 1,000 
predictions for each month of these simulated series are used to make the forecast. The 
code for producing the forecasts is contained in Appendix B. 

The forecast error for each observation in the 9-period time series is defined by 

^t(forecast)' 
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Once calculated, diagnostics of the forecast errors are performed. The A t values are 
plotted to determine if they appear randomly distributed. Correlograms of the forecast 
errors are plotted to determine if they exhibit significant ACF or partial ACF. 

In order to provide results that can be conveniently and directly compared to the 
original data, the centering procedure required for ARMA model development is reversed 
as follows 

A. _ A 

yl(forecast) ^t(forecast) 

The percent error of the forecasts, defined as 

percentError = (y t - y, (forecasl) )/y, -100, 

is also calculated for each of the nine forecast values. Finally, the actual behavior of 
response variable, y t , the forecast value, y , (forecast) , and the forecast plus and minus one 

standard deviation of the forecast are plotted to provide a visual tool for interpreting the 
forecast’s accuracy. 
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V. RESULTS 


Models are developed that include all sixteen factors plus monthly indicators as 
described in Chapter HI. Based on the recommendations made by Goldhaber in his 1999 
review of the Navy’s Enlisted Goaling Model, models that exclude all advertising data 
are also developed. The models that do not include advertising impression data are more 
accurate for forecasting as measured by the percent error of the forecasts. Therefore, all 
model results described in this chapter refer to models that initially excluded advertising 
impression data. The behavior of the advertising time series is still examined in the 
Descriptive Statistics section. 

A. DESCRIPTIVE STATISTICS 

1. Time Series Graphs 

A basic step in the analysis of time series is plotting the data to identify trends, 
outliers, seasonality, and other cyclic changes (Chatfield, 1996). The series for the 
variables in each brigade are reflected in Figures 5.1 through 5.5. In all these graphs, the 
training and test series are plotted as one series. A vertical line between December 1996 
and January 1997 indicates the division between these two sets. Note that the linear 
construct of the four independent variables converted from annual to monthly data, 
(target population size, college attendance rates, high school graduate wages, and college 
premium), precludes observation of seasonal behavior. 

The first seven time series graphs address eight variables specific to Army 
recruiting and USAREC policies. The Army’s high-quality male recruiting shortages are 
clearly reflected by the increasing difference between the recruiting mission and 
production in each brigade. Recruiting production demonstrates clear seasonality with 
peaks each June. Recruiter strength does not appear seasonal, but assignments increase 
noticeably in all brigades beginning in early 1997. All advertising media display a 
decreased number of impressions around the months of June and July. Television and 
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radio demonstrate increasing trends, while magazine and newspaper impressions lack 
clear trends. The percent of youth receiving the Army College Fund exhibits no 
discemable trend or seasonality. 

Factors of the recruiting environment are captured in the next five time series 
graphs. In all brigades, target population figures initially demonstrate a decrease between 
0.2% and 2.9%. However, in all but the First Brigade region, there is a net growth in the 
target population between 1992 and 1997. Relative growth is greatest in Second Brigade, 
followed closely by Sixth Brigade (10.1% and 9.9% respectively). In all regions, 
unemployment displays clear seasonality with peaks in January and June and an overall 
decreasing trend. High school graduate wages demonstrates a steady increase over time. 
The behavior of college attendance rates differs between brigades. All regions show a dip 
in attendance in 1994. In the Fifth and Sixth Brigade regions, there is a net decrease in 
the college attendance rate over the period examined, while the other three regions 
experience a net increase. None of the attendance rate changes are more than two 
percentage points. The college premium exhibits a net increase in all brigades, though the 
behavior varies by region. The relative increase is largest in Third Brigade (43%) and 
smallest in Second Brigade (10%). 

The final time series graphs address the behavior of rival services’ recruiting 
production in the high-quality male demographic. The Air Force, Marine Corps, and 
Navy all display a decreasing trend until about 1994, after which the mean of each series 
appears fairly constant. All the rival services demonstrate seasonal summer peaks, though 
their occurrence seems to vary by one to two months, with the Air Force’s peak occurring 
later in the summer. 
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Figure 5.1 First Brigade Factor Time Series 
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Figure 5.2 Second Brigade Factor Time Series 
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Figure 5.2 Continued Second Brigade Factor Time Series 
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Figure 5.3 Third Brigade Factor Time Series 
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Figure 5.3 Continued Third Brigade Factor Time Series 
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Figure 5.4 Fifth Brigade Factor Time Series 
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Figure 5.4 Continued Fifth Brigade Factor Time Series 
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Figure 5.5 Sixth Brigade Factor Time Series 
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2. Time Series Structure of the Variables 

The correlograms for the variable series in all brigades are contained in Appendix 
C. They prove valuable for confirming the observations of seasonality discerned from the 
time series graphs and also for identifying cycles that are not visually evident. Only 
departures from graphical observations, additional insights, and unexpected results are 
addressed in this section. 

The ACF of the mission variable series decays to an insignificant level in a lag of 
k =5 or less in all brigades. This behavior is somewhat unexpected. It suggests that 
USAREC did not issue contract missions to its subordinate brigades in accordance with 
the known seasonal behavior of recruiting production. 

TV advertising impressions demonstrate seasonality with significant annual ACF 
and partial ACF figures in all brigades. Magazine ad impressions reflect clear biannual 
peaks in ACF at 6- and 12-month lags. Radio advertising impressions are not consistent 
across the country. In Second and Third Brigades, the autocorrelation functions are 
significant at k=6. First Brigade has a significant positive ACF at a lag of 12 months. 
Fifth and Sixth Brigades have a significant positive ACF at both k=6 and k=12. 
Newspaper advertising impressions display no evidence of periodicity from the ACF. 

In all regions except Sixth Brigade, Air Force high-quality male contracts 
demonstrate significant peaks in ACF at lags of 6 and 12 months. In the Sixth Brigade 
region, Air Force recruiting shows a significant peak in ACF only at 6 months, which is 
unexpected. The Marine Corps’s production in this demographic reflects the strongest 
seasonal behavior of any service, with very significant ACF at lags of 12 and 24 months 
and, in all but the Third Brigade region, clearly significant ACF at a lag of 36 months. 
The ACFs’ behavior also confirms the fact that the Marine Corps’s high-quality male 
production appears the most consistent of all the services with little trend and very clear 
seasonality. In all but the Sixth Brigade region, the ACF for the Navy’s high-quality male 
recruiting series demonstrates a much slower decay than the other services, indicating a 
less seasonal behavior. However, the Navy production correlograms does still have peaks 
in the ACF at a lag of &=12. 
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3. Correlation Between Time Series 

The time series in each brigade are examined for high values of simple correlation 
between variables. Variables with high correlation present the potential for multi- 
collinearity, which can cause unstable models. The threshold for high correlation in this 
study is defined as p > .95. Identification of correlation values above this level do not 
represent a criterion for excluding variables from the initial model. However, the final 
reduced models are examined to ensure that both variables from pairs with high 
correlation values are not present. 

In First Brigade, the correlation between high school graduate wages and the 
college attendance rate is 0.97. In Third Brigade, the correlation between these same 
variables is 0.95. These are the only two cases of correlation above the designated 
threshold. Both variables in this pair are from data that was transformed into monthly 
figures by linearizing between annual observations. In both First and Third Brigades, the 
college attendance rate variable is eliminated during the model reduction process. 

4. Centered Data 

In order to meet the requirement that an AR model must be developed for a zero 
mean series, the data for each variables is centered on the respective training set mean, 
yorx, . Not knowing in advance the order of the ARMA model that will be most 

effective, the centered data is used for all model development. The series averages and 
extrema for all variables in the full, test, and training data sets for each brigade are listed 
in Appendix A. 

B. MODEL DEVELOPMENT 

1. Initial Model Selection 

Strictly autoregressive (AR), strictly moving average (MA), and mixed 
autoregressive moving average (ARMA) models are all explored during model 
development. Additionally, different manipulations of the centered data are explored 
including a one-period lead of all predictive variables and logistic transformations. 
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Moving average models using the centered data prove the most effective for creating 
models of the lowest order that have no significant ACF or PACF of the residuals. 

The selection criteria for the moving average models are contained in Table 5.1. 
In this table, an asterisk in the “Moving Average Order” column indicates the model 
selected for step-wise reduction. 


Moving Average 

Significant ACF or PACF 

AIC 

Order (q) 

of residuals 



1 st Brigade 


1 * 

no 

600.4 

2 

no 

601.4 

2 nd Brigade 

1 

no 

639.4 

2 

no 

634.0 

3 

no 

633.5 

4 * 

no 

623.8 

5 

no 

628.1 

3 rd Brigade 

i 

no 

658.8 

2 * 

no 

641.6 

3 

yes - PACF at lag k = 8 

628.9 

5 th Brigade 

i 

yes - ACF at lag k = 2, PACF at lag k = 4 

589.5 

2 

yes - ACF at lag k = 2, PACF at lag k = 4 

580.3 

3 

no 

564.3 

4 

no 

550.5 

5* 

no 


6 

no 

572.4 

6 th Brigade 

i 

yes - PACF at lag k = 7 

624.4 

2 

yes - PACF at lag k-1 

625.2 

3 

yes - PACF at lag k = 4 

612.9 

4 

yes - ACF at lag k = 4,PACF at lag k = 4 

599.5 

5 

yes - PACF at lag k = 6 


6* 

no 

591.3 

1 

no 

592.8 


Table 5.1 Model Selection Criteria Measures 
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The histograms contained in Figure 5.6 show that the assumption of a normal 
distribution of errors is plausible. First, Third, and Fifth Brigades’ residual histograms are 
negatively skewed, while Second and Sixth Brigades’ are positively skewed. 
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Figure 5.6 Residual Distributions for the Initial Models in Each Brigade 


2. Stepwise Model Reduction 

Reduced models are produced with the stepwise recursion using both the 
parametric and non-parametric means of sampling from the residual errors. The 
difference in which factors appear in the final models produced by each method is 
minimal. Only one variable in one of the five brigades differs between the two 
techniques. This result supports the supposition that the residuals from the initial models 
have normal distributions. The results of the non-parametric application of the stepwise 
reduction recursion are contained in Tables 5.3 and 5.4. The first table lists the sign of the 
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regressor and moving average coefficients of the initial full models and those of the 
significant factors in the reduced models. Table 5.4 lists the sign and magnitude of the 
regressor and moving average coefficients in the reduced models. 
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« + 
See 


Index 

Factor 

1st BDE Full Model 

Reduced 

2nd BDE Full Model 

Reduced 

3rd BDE Full Model 

Reduced 

5th BDE Full Model 

Reduced 

6th BDE Full Model 

Reduced 

c 

u> 

« 

o 

o 

« 

tfi 

UJ 

Q 

ID 

# BDEs factor slg ar 

# BDEs factor slg ar 

1 MISSION 

+ 

+ 

+ 

rv 

+ 

+ 

+ 

+ 


3 

3 


2 RECRUITERS 

- 

- 

+ 

+ 

+ 

+ 

- 

- 

3 

1 

2 

3 TVADS 







SBjffiPZjW 

£'-•? V 



4 RADIOADS 

SIM 












M 1 Mi Hi! M—BMMMi 



mmmi 








ftMBSi 

v; ; ; 

6 NEWSPADS 

mm 









W&z 



7 COLOPTION 

- 

- 

- 

+ 

+ 

+ 

+ 

+ 

3 

2 

1 

8 TGTPOP 

+ 

I * 

- 

- 

- 

- 

- 


2 


2 

9 UNEMP 

+ 

+ 

- 

- 

- 

- 

- 

- 

- 

- 

5 

1 

4 

10 HSGRADWAGE 

+ 

+ 

- 

- 

- 

- 

- 

- 


+ 

5 

2 

3 

mm i a 11 h i u — 

♦ 

+ 

+ 

- 

+ 

+ 

+ 

2 

2 


12 COLLPREM 

- 

- 

+ . 


+ 

+ 

+ 

+ 

3 

2 

1 



+ 

+ 

- 

- 

- 

+ 

+ 

3 

2 

1 

14 MCHQMC 

- 

- 

- 

+ 

+ 

- 

- 

+ 

+ 

4 

2 

2 

15 NHQMC 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

4 

4 


16 FEB 

- 

- 

- 

- 

mm 




2 


2 

KQBZZDHpmimm 


- 

- 

- 

+ 

- 

- 


2 


2 



- 

- 

- 

+ 

- 

- 

+ 


3 


3 


■ 

- 

- 

- 

+ 

- 

- 

- 


3 


3 

20 JUN 

+ 

+ 

+ 

+ 

+ 

♦ 

+ 

+ 

+ 

+ 

5 

5 


21 JUL 

+ 

+ 

+ 

+ 

. 

+ 

- 

- 

3 

2 

1 

22 AUG 

+ 

+ 

+ 

+ 

- 

+ 

- 


2 

2 


23 SEP 

+ 

+ 

+ 

+ 

+ 

+ 

♦ 

+ 

+ 

4 

4 


24 OCT 

+ 

+ 

+ 

. 

- 

+ 

+ 

2 

2 


25 NOV 

- 

+ 

- 

- 

- 

+ 


1 


1 

26 DEC 

- 

- 

- 

• 

- 

- 

5 


5 

MAI 

+ 

+ 

. 

. 

+ 

+ 

+ 

+ 

+ 

+ 




^M 


+ 

- 

+ 

+ 

+ 

+ 

+ 

+ 






+ 

+ 

mmmm 

warn 


+ 

+ 





■ 

+ 

+ 


+ 

+ 

+ 

+ 





■ 

■ 


+ 

+ 

+ 

+ 





■ 




9KSM 

- 




PjfMfflillP— 


15 

9 

15 


14 




Non-month 


7 

5 

7 

9 


9 




Month 


9 

10 

2 

6 


5 







■■HI 

SI 



(0, 6) 





Figure 5.3 Sign of Significant Factors and Moving Average Coefficients for Full and 
Reduced Models 
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Final Models for Centered 54-month Data 


Index 

Factor 

1st BDE 


2nd BDE 


3rd BDE 


5th BDE 


6th BDE 


1 

MISSION 

+ 

1.76E-01 


+ 

1.96E+01 

+ 

2.77E-01 


2 

RECRUITERS 

- 

3.23E-01 



+ 

4.59E-01 

- 

1.61 E-01 

3 

TV ADS 






4 

RADIOADS 






5 

MAGADS 






6 

NEWSPADS 






7 

COLOPTION 


• 

5.07 


+ 

5.77 

♦ 

3.09 

8 

TGTPOP 



- 

0.00222 

- 

0.00987 


9 

UNEMP 

+ 

6.28E+01 

- 

9.74E+01 

- 

2.65E+01 

- 

6.92E+01 

- 

4.35E+01 

10 

HSGRADWAGE 

+ 

9.10E+00 

- 

1.09E+01 

- 

2.94E+00 

- 

1.49E+00 

+ 

3.70E+00 

11 

GOTOCOLRATE 


+ 

675 

■ H 


+ 

265 

12 

COLLPREM 

- 

5.08E+00 


+ 

1.15E+00 


+ 

1.90E+00 

13 

AFHQMC 


♦ 

78.8 


- 

0.477 

+ 

0.532 

14 

MCHQMC 

- 

5.01 E-01 


+ 

4.05 E+01 

- 

4.52E-01 

+ 

5.05E-01 

15 
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+ 

3.13E-01 


+ 

4.02 E+01 

+ 

1.22E+00 

+ 

5.55E-01 

16 
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- 

68.4 

- 

56.5 




17 
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- 

4.95E+01 

- 

7.19E+01 




18 
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- 
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- 
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- 

9.06E+01 
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- 
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ID 
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- 

5.13E+01 


20 

JUN 

+ 
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+ 
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+ 
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+ 

2.84E+02 

+ 

3.49E+01 

21 
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wem 

■HBEM 

+ 

1.20E+02 



- 

2.99E+01 

22 

AUG 

m 


+ 

6.09E+01 




23 

SEP 

won 


K9 



♦ 

4.33E+01 

+ 

2.80E+01 

24 
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mm 
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25 
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Figure 5.4 Sign and Magnitude of the Reduced Model Regressors and Moving 
Average Coefficients 


The total number of significant factors in each brigade varies between nine and 
sixteen. The number of significant continuous predictor variables in a brigade ranges 
from five to nine, while the number of significant monthly indicators varies between two 
and ten. All of the continuous variables are significant in at least two brigades, and all of 
the monthly indictors are significant in at least one brigade. Interpretation of the 
significant regressors is addressed in Section D. 

3. Reduced Model Diagnostics 

None of the residuals from the final reduced models display significant ACF or 
partial ACF. In all cases, the reduced models have a lower AIC value than the initial full 
models, as reflected in Table 5.5. Because the order of the models does not change during 
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the reduction process, the smaller AIC values stem from improvements in model 
accuracy. 


_ 

Initial Full Model AIC 

Reduced Model AIC 


600 

591 

■ESE5S91 

623 

611 

Hinoim 

642 

642 

HEiESnSQUH 


546 

MESSilEHi 

591 

577 


Table 5.5 Akaike Information Criteria Values for the Full and Reduced Models 


C. FORECASTING 


1. Nine-month Forecasts 

The test set data and final reduced models are used in a simulation to produce a 
nine-month predicted time series of the centered response variable, z t ■ The percent error 
of the forecast for each month, as defined in Chapter IV, is calculated and reflected in 
Table 5.6. The figures marked with an asterisk represent the cases in which the 
confidence interval contains the known response variable value. 


BDE 

Projected Month 


1 

2 

3 

4 

5 

6 

7 

8 

9 

1 


-17.6 

-22.4 


-16.8 

-13.6 

-1.8* 

-8.2 

-13.7 

2 


4.8* 

-14.7 


12.7* 

23.4 



12.3 

3 



msm 




-6.2* 



5 


Bfgm 

warn 

-3.9* 

-21.9* 


19.5* 

9.9 


6 

-5.2* 

18.2 

18.1 

15.7 

1.0* 

15.9 

24.7 

23.7 

29.4 


Table 5.6 Percent Error for Each Period of the Nine-Month Forecasts 


The mean predicted value of the response variable, y t (forecast), and a confidence 
interval of +/- 1 standard deviation of each forecast are plotted along with the known 
values of the response variable. The last three months of the training period and the 
predicted time series for each brigade are reflected in Figures 5.7 through 5.11. These 
graphs reveal that the forecasts do capture the general behavior of recmiting production 
during the test period. 
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First Brigade Nine-Month Forecast 


January 1997 to September 1997 



Figure 5.7 First Brigade Forecast 

Second Brigade Nine-Month Forecast 


January 1997 to September 1997 



Figure 5.8 Second Brigade Forecast 
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Recruiting Contracts 


Third Brigade Nine-Month Forecast 


January 1997 to September 1997 
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Figure 5.9 Third Brigade Forecast 

Fifth Brigade Nine-Month Forecast 


January 1997 to September 1997 



i i i i i i i i i i i r 
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Figure 5.10 Fifth Brigade Forecast 
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Recruiting Contracts 


Sixth Brigade Nine-Month Forecast 


January 1997 to September 1997 



Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep 

1996 1996 1996 1997 1997 1997 1997 1997 1997 1997 1997 1997 


Figure 5.11 Sixth Brigade Forecast. 
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2. Forecast Error Diagnostics 

The forecast errors, A t , are calculated and plotted to determine if they appear 
randomly distributed. The forecast error plots are shown in Figure 5.12. 
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Figure 5.12 Plot of the forecast errors for each brigade 


The limited number of forecasts makes it difficult to discern if the errors are randomly 
distributed. With the exception of Fifth Brigade, there are no apparent relationships that 
cause concern, such as errors increasing with forecast length or similarities in the 
distributions across brigades. The errors in Fifth Brigade demonstrate a near-linear 
decrease in the first six periods of the forecast series, but the last three periods appear to 
return to a random pattern. None of the forecast errors display significant ACF or partial 
ACF. 
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D. DISCUSSION 


1. Regressor Coefficient Interpretation 

Analysis of the significant factors in the final reduced models addresses three of 
the primary objectives of this thesis: validating factors from previous research; exploring 
recent suppositions regarding prominent factors; and enumeration of the differences in 
the recruiting environment throughout the country. In general, the presence of factors in 
the final models and the sign of their respective coefficients does not provide clear insight 
into the recruiting process. The significant factors vary considerably across brigades. In 
many cases, the signs of the coefficients for the same significant factors are inconsistent 
between brigades, and some are inconsistent with prior research. The significant factors 
and the regressor coefficient signs are listed in Tables 5.3. 

At least one USAREC policy variable remains in the final model of each brigade. 
Mission is significant in three brigades and the coefficients’ sign is positive as expected, 
meaning that recruiting production increases when USAREC issues higher quotas. 
Recruiter strength is significant in three brigades. In Fifth Brigade, increasing recruiter 
strength has a measurable positive effect on production. However, in First and Sixth 
Brigades, the sign of the coefficient for this variable is negative. This result initially 
appears counter-intuitive, but examination of the time series graphs reveals that, in these 
brigades, increases in recruiter strength did not provide the desired effect of boosting 
production. The number of eligible recruits receiving the college option is significant in 
three brigades. Like the behavior of the college option time series, the impact is not 
consistent, since the effect of the variable is positive in only two of these brigades. 

The reduced models for each brigade contain at least two factors of the recruiting 
environment. The target population variable is significant in two brigades and for both 
the coefficient sign is negative. This result is counter-intuitive. Unemployment is a 
significant variable in all brigades, which is in line with prior research. However, the 
expected sign of the coefficient (positive) is present in only one brigade. Once again, this 
result may be partially explained by examining the behavior of the production and 
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unemployment time series. Over the period of this study, unemployment decreases in all 
regions, yet production remains fairly constant. The high school graduate wage level, 
which is also prominent in previous studies, is significant in all brigades. The variable 
coefficients display the expected sign (negative) in only three of the five brigades. The 
college attendance rate appears in the reduced models of two brigades, though the impact 
of the variable on recruiting production is positive and not negative as expected. The 
college premium variable is significant in three brigades and behaves as expected only in 
First Brigade. 

At least one of the rival services’ recruiting production variables appears in the 
final model of each brigade. Air Force high-quality male recruiting contracts are 
significant in three brigades, and are positive predictors in two of these regions. Marine 
Corps and Navy high-quality male recruiting variables are significant in four brigades. 
The Marine Corps production figures are negatively related to Army production figures 
in two brigades. The Navy production variable is the most consistent. Its behavior is 
positively related to Army production in all four brigades. In Sixth Brigade all rival 
service figures are significant and are positive predictors. 

The sign of the coefficients for the monthly indicators behaves as expected in all 
but one case. December is consistently a difficult month, while June is a prolific month 
for recruiting. September is a significant and positive month in four of the brigades. In 
general, the winter months are negative, but not significant in all regions. All of these 
results are consistent with known recruiting production behavior. The negative 
coefficient for the July indicator in Sixth Brigade’s final model represents the one 
counter-intuitive result for the binary monthly variables. 

2. Model Form 

It is difficult to draw conclusions regarding the first three thesis objectives 
because it is unclear whether the inconsistencies in the final models are due to random 
noise, inappropriate model form, new realities of the recruiting environment, and/or 
legitimate differences in the recruiting process across the country. The final models also 
produce disappointing results regarding the fourth thesis objective, which is to develop an 
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accurate tool for forecasting recruiting production. Chatfield provides some insight into a 
potential explanation. He states that “building a ‘good’ model from data subject to 
feedback can be difficult”(Chatfield, 1996). Feedback exists in systems in which the 
outputs in one period affect the inputs in future periods. In a discussion about 
econometric models, he states that in processes with rapid feedback loops, a single 
multivariate equation is less appropriate than modeling the system with multiple 
equations. However, he maintains that a multivariate model may still prove useful if the 
feedback in the system is slow and if the overall system is not well controlled by the 
inputs, as in the case of the economy. 

Chatfield’s observation may provide an explanation of why model accuracy 
improves when advertising data is removed. Clearly recruiting production results in one 
period have an impact on decisions about advertising expenditures in future periods. 
Concern about feedback raises the issue of whether all factors under USAREC’s control 
should be eliminated from the model or whether multiple models need to be developed. 
Removal of all variables representing policies and resources under USAREC’s control is 
unappealing because the resultant models would not provide any basis for analyzing 
policy and resource allocation. The feedback loop from production results to changes in 
recruiter strength is much slower than the response time in advertising feedback, which is 
an argument for maintaining this factor in model development. Based on the rapid 
feedback of production results on future missions and the ease with which missions can 
be changed, there may be a legitimate claim that inclusion of this variable has the 
potential to induce bias in the model coefficients. However, the reduced models that 
contain mission as a significant variable are no more or less accurate than those in which 
this factor is eliminated during the reduction process. 
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VI. CONCLUSIONS AND RECOMMENDATIONS 


A. SUMMARY 

The United States Army is currently experiencing the greatest recruiting shortfalls 
in the history of the All-Volunteer Force. The service faces unprecedented competition 
for young people as unemployment is at its lowest level in thirty years and college 
attendance rates are the highest in American history. Under these conditions, USAREC 
requires tools to quantify the impact of factors in the recruiting environment, to identify 
differences in the recruiting processes across its five regional subordinate units, and to 
measure the effectiveness of its policies and resource expenditures. 

This thesis examined recruiting data from July 1992 to September 1997, a very 
dynamic period for the Army and Recruiting Command. The scope was limited to the 
high-quality male demographic, defined as a recruit who scored above the 50th percentile 
on the AFQT and who is a high school graduate or GED holder. This thesis aimed to do 
the following: 

1. Validate factors from previous models on more current data. 

2. Explore suppositions regarding new influences on the recruiting environment. 

3. Enumerate the differences in the recruiting environment throughout the 
country. 

4. Develop an accurate tool for predicting recruiting production. 

Multivariate time series analysis was used to predict the number of enlistment 
contracts signed in a month as a function of exogenous and endogenous factors plus 
monthly indicators. Fifteen factors were initially included for examination in this study as 
predictive variables. They were selected based on their appearance in previous models or 
in recent research. The bootstrap was used to overcome the difficulties in determining 
significant factors presented by short duration of the recruiting data time series. This 
technique allowed resampling from within the existing data to provide robustness in the 
factor determination process. A stepwise recursion was developed to eliminate from the 
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time series models factors that were not statistically significant. The factors remaining in 
the reduced models were compared to those found to be significant in past research. The 
developed models were also used to create nine-month projections of recruiting 
production. The results were then compared to known production figures from test set 
data to determine forecast accuracy levels. 

The final models indicated that unemployment figures and high school graduate 
wage levels are significant factors for predicting recruiting production. These results were 
consistent with findings from previous studies. However, the impact of these two factors 
was not clearly interpretable across the five recruiting brigades. In some brigades the 
effect of these variables on recruiting production was positive, while in other brigades the 
effect was negative. No consistent factors for measuring the competition between the 
Army and post-secondary schooling emerged from the model development process. The 
final models did successfully capture the seasonal nature of recruiting. There were 
considerable differences in the final model for each brigade, which despite probable noise 
in the data, indicated that influential predictors of recruiting production differ regionally. 
The forecasts produced using the final models captured the general behavior of the 
recruiting production series in the test period, but overall their accuracy was 
disappointing. 

B. CONCLUSIONS 

The 1990’s were a dynamic period for U.S. Army recruiting. Predictions based on 
the past are dependent on the assumption that past patterns within each series and the 
relationships between series remain the same. Clearly, the recruiting process and 
environment was evolving rapidly over the period of this study. This evolution created 
noise in the data used in this thesis. Noise was likely also induced by the transformations 
required to convert some of the available data into a useable form for time series analysis. 
The results do support the intuition that the influential factors differ by region, a subject 
not addressed in the previous studies reviewed. Though the models developed in this 
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thesis may represent a descriptive tool for what occurred in the recruiting process during 
the period studied, they lack the forecasting accuracy to provide legitimate opportunities 
for “what if’ analysis or optimization. 

The most meaningful contribution of this thesis is the development of the 
stepwise recursion using bootstrap simulation for identifying significant factors in 
multivariate time series analysis. It proved to be a useful tool for providing robustness in 
a situation where data was limited. This methodology offers potential for further 
refinement and application. 

C. RECOMMENDATIONS FOR FURTHER RESEARCH 

Recommendations for improvements to the recruiting research fall in two 
categories: data collection and data handling. Future studies should attempt to have all 
predictive data specific to the targeted demographic. For example, college attendance 
rates, unemployment, and wage figures should address 17-24 year-old males only. All 
original data should consist of monthly observations. Additional indicator variables 
should be developed to represent additional policies, organizational structure, incentive 
programs, and specific events, such as military conflicts or government shutdowns. 
Multiple variables could be combined into single variable representations and co¬ 
integration vectors could be developed. Advanced time series analysis techniques should 
be explored including smoothing and filtering. 

The stepwise reduction recursion using bootstrap simulation merits further 
exploration. A fist step should be testing the method with data for which accurate results 
have been determined through other techniques. An examination of the impact of 
changing the a values, the parameter for controlling the tolerance for which variables are 
removed from the model, is warranted. A method that uses incremental changes in the a 
values could be developed to rank order the significance of factors. Additionally, the 
number of model subsets examined in the model development process could be expanded 
by the creation of a forward addition stepwise recursion. These recommendations offer 
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means to further develop the potential shown by this technique for multivariate time 
series analysis. 
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APPENDIX A. ORIGINAL DATA SERIES SUMMARIES 


First Brigade Data Summary 
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Second Brigade Data Summary 
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Third Brigade Data Summary 
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Fifth Brigade Data Summary 
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Sixth Brigade Data Summary 
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APPENDIX B. S-PLUS CODE 


Bootstrap Recursion S-Plus Code 


function(mod, y, 
{ 

# 

# bootstrapSim: 

# 


xreg, n = 250, parametric = F) 

Simulate ARIMA models for the boostrap 


args: mod: 

xreg: 

n: 

parametric: 


arima model 

matrix of regression dependent variables 
Number of trials 

Use parametric (Normal-based) bootstrap? Default F 


if(class(mod) != "arima") stop("This function operates on an 
arima model") 

Set up output 

this.is.ar <- this.is.ma <- F 

xreg.out <- matrix(NA, n, length(mod$reg.coef)) 
if(any(names (mod$model) == "ar")) { 

this.is.ar <- T 

ar.out <- matrix(0, n, mod$model$order[1]) 

} 

if(any(names(mod$model) == "ma")) { 

this.is.ma <- T 

ma.out <- matrix(0, n, mod$model$order[3]) 

} 

Extract residuals 

resids <- arima.diag(mod, resid = T, plot = F)$resid 
r.len <- length(resids) # 

# 

# If data is missing, try to find it. 

# 

if (missing(xreg)) { 

name <- as.character(mod$call)[4] 
if(!exists(name)) 

stop(paste("Can 7 t find regressors in", name)) 
xreg <- get(name) 

} 

if(missing(y)) { 

name <- mod$series 
if(!exists(name)) 

stop(paste("Can 7 t find y data in", name)) 
y <- get(name) 

} 

# 

# 
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*= ^ ^ 


# Main loop 

# 

if(mod$model$order[1] != 0) { 

skip <- mod$model$order[1] 

first.resids <- y[l:skip] - xreg[1:skip, ] %*% 
mod$reg.coef 


the previous step accounts for the start-up cost of AR models and 
fills in the p missing residuals with approximations. 


} 

else 


for(i in l:n) { 

if(i %% 100 == 0) { 

cat("Operating on loop ", i, "\n") 

} 

if(parametric) { 

cat("parametric\n") 
resid.sd <- sgrt(var(resids)) 
new.y <- arima.sim(mod$model, xreg = xreg, 
reg.coef = mod$reg.coef, innov 
= rnorm(n = length(resids), sd = resid.sd)) 

} 

else { 

new.resids <- c(first.resids, 

resids[sample((skip + l):r.len, replace = 

T)]) 

} 

new.y <- arima.sim(mod$model, xreg = xreg, reg.coef = 
mod$reg.coef, innov = new.resids) 
new.model <- arima.mle(new.y, model = mod$model, xreg 
= xreg, max.fcal = 400, max.iter = 250) 
if(new.model$converged == F) { 

cat("Warning: model", i, "didn't converge!\n") 

} 

else { 

if(this.is.ar) 

ar.out[i, ] '<- new.model$model$ar 
if(this.is.ma) 

ma.out[i, ] <- new.model$model$ma 
xreg.out[i, ] <- new.model$reg.coef 

} 

} 


{ 

first.resids <- y - xreg %*% mod$reg.coef 
for(i in l:n) { 

if(i %% 100 == 0) { 

cat("Operating on loop ", i, "\n") 

} 

new.resids <- resids[sample(1:r.len, replace = T)] 
if(parametric) { 

cat("parametric\n") 
resid.sd <- sgrt(var(resids)) 
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new.y <- arima. sim(mod$model, xreg = xreg, 
reg.coef = mod$reg.coef, innov 
= rnorm(n = length(resids), sd = resid.sd)) 

# 

# If "innov" is supplied, it should be a vector, 

# e.g. innov = rnorm (n = length(resids), sd = resid.sd) 

# If "innov" is NOT supplied, then a vevtor of innovations is generated 

# by the rand.genO function, which, by default, is rnorm. Additional 

# arguments to this function can be pased as arguments to arima.sim, 

# e.g. [innov = not passed], rand.gen = rnorm, n = length(resids), 

# sd = resid.sd 

# 

} 

else { 

new.y <- arima. sim(mod$model, xreg = xreg, 
reg.coef = mod$reg.coef, innov = new.resids) 

} 

new.model <- arima.mle(new.y, model = mod$model, 
xreg = xreg, max.fcal = 300, max.iter = 150) 
if(new.model$converged == F) { 

cat("Warning: model", i, "didn't converge!\n") 

} 

else { 

if(this.is.ma) 

ma.outfi, ] <- new.model$model$ma 

xreg.out[i, ] <- new.model$reg.coef 

} 

} 

} 

if(this.is.ar) 

if(this.is.ma) 

return(list(Xreg = xreg.out, AR = ar.out, 

MA = ma.out)) 

else return(list(Xreg = xreg.out, AR = ar.out)) 
else return(list(Xreg = xreg.out, MA = ma.out)) 

} 


69 




=* =#: =tt= =#= =#= 


Stepwise Reduction Recursion S-Plus Code 


function(mod, y, regressors, n = 1000, parametric = F, SD.range = 1, 
maximumlterations = 1) 

{ 


# 

# Stepwise: eliminate time-series regressors by backward elimination. 

# 


# Arguments: mod: 

# y: 

# regressors: 

# n: 

# parametric: 

# SD.range: 

# 

# 

# 

# maximumlterations: 

# 

# 

# 


arima model 

y data vector(response variable) 
matrix of regressors 
n to be passed to bootstrapSim 
to be passed to bootstrapSim 

a tolerance for deciding when to stop deleting 
columns. Stop deleting regressors when no column 
of regression coefficients has "0" in the range 
(mean +/- SD.range * SD). 

maximum number of times to run through the 
discarding loop. Default is 1. 


stillDiscarding <~ T 
counter <- 0 # 


Save "y" in frame 1 so "arima.diag" and others can find it if needed 
assign("y", y, frame = 1) 

while(stillDiscarding && counter < maximumlterations) { 
counter <- (counter + 1) 

Print the current columns and run bootstrapSim. 

cat("Loop ", counter, ", cols are ", 
dimnames(regressors) [[2]], "\n") 
cat("Calling bootstrapSim to execute bootstrap \n") 
bsRegCoefs <- bootstrapSim (mod, y, regressors, n = n, 
parametric = parametric)$Xreg 
proportion <- vector("single", ncol(bsRegCoefs)) 

# 

# For each column of the resulting bootstrap regression coefficients, 

# find the proportion of values that fall between 0 and (2 * the mean 

# of the column). The objective is to discard the column with the 

# largest such proportion; that's the column in which most of the 

# values are close to 0. Each column of bsRegCoefs corresponds to a 

# factor in the time series. 

# 

for(i in 1:(ncol(bsRegCoefs) -1)) { 
cat("Examining factor ", 

dimnames(regressors)[[2]][i], "\n") 
numerator <- 0 

x.bar <- mean(bsRegCoefs[, i], na.rm = T) 
if(na.sum <- sum(is.na(bsRegCoefs[, i]))) 
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cat (paste ("Encountered", na.sum, "missing 
coeffs\n")) 

sd <- sgrt(var(bsRegCoefs[, i], na.method = "omit")) 

# 

# If 0 is in the range of the mean +/ SD.range * sd, calculate 

# proportion. If not, assign that column a proportion of 0. 

# 


if(x.bar - SD.range * sd <= 0 && x.bar + SD.range * 
sd >= 0) { 

for(j in l:n) { 

if(bsRegCoef[j, i] == !NA) { 

if(bsRegCoefs[j, i] < SD.range * sd 
&& bsRegCoefs[j, i] > - (SD.range 
* sd)) { 

numerator <- numerator + 1 


} 

} 

} 

proportion[i] <- numerator/n 

cat("Proportion is ", proportion[i], "\n“) 

} 

else proportion[i] <- 0 


# After examining all factors, choose the factor which has the highest 

# proportion. If all proportions are equal to 0, set flag to exit 

# "while" loop. 

# 


} 


if(all(proportion == 0)) { 

cat("No additional factors discarded \n") 
stillDiscarding <- F 

} 

else { 

max.index <- (1:length(proportion))[proportion == 
max(proportion)][1] 
cat("Discarding ", 

dimnames(regressors) [ [2]] [max.index], "\n") 

regressors <- regressors[, - max.index] 

assign("regressors", regressors, frame = 1) 
mod <- arima.mle(y, model = list(order - 

mod$model$order), xreg = regressors, maxfcal = 300, 
max.iter = 150) 

} 

} 

cat("final significant factors are: ", dimnames(regressors)[[2]], 
" \n") 

return(regressors) 
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Forecasting Simulation S-Plus Code 


function(mod, N, XREG, regressionCoef, loops, ...) 

{ 

# 

# makePrediction: generates multiple ARIMA simulations to predict a 

# univariate time series. Returns the mean and sd of 

# the predictions for each period in the series. 

Also returns a histogram of the simulated values for 
the first period in the predicted time series. The 
... notation allows the user to specify how the 
innovations for arima.sim are created. 


args: mod: 

N: 

XREG: 

regressionCoef: 


the order for the ARIMA model 

the number of periods in the desired time series 
a matrix of regression variable values 
a vector of regression coefficients corresponding to 
xreg 


predOut <- matrix(nrow = loops, ncol = 9) 
for(i in 1:loops) { 

x <- arima.sim(model = mod, n = N, xreg = XREG, 
reg.coef = regressionCoef, ...) 
predOut[i, ] <- x 

} 

mean <- apply(predOut, 2, mean) 
var <- apply(predOut, 2, var) 
sd <- sqrt(var) 
hist(predOut[, 1], nclass = 20) 
return(mean, sd) 
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APPENDIX C. ORIGINAL DATA SERIES CORRELOGRAMS 

First Brigade Autocorrelation Function Correlograms 


H-Q Male Recruiting Production H-Q Male Recruiting Mission 



Recruiter Strength TV Ad Impressions 
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